A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Authors

  • F. Soltanzadeh General Linguistics Department, Allameh Tabatabaei University, Tehran, Iran.
Abstract:

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performance in Conditional Random Field-based Persian Named Entity Recognition, a several syntactic features based on dependency grammar along with some morphological and language-independent features have been designed in order to extract suitable features for the learning phase. In this implementation, designed features have been applied to Conditional Random Field to build our model. To evaluate our system, the Persian syntactic dependency Treebank with about 30,000 sentences, prepared in NOOR Islamic science computer research center, has been implemented. This Treebank has Named-Entity tags, such as Person, Organization and location. The result of this study showed that our approach achieved 86.86% precision, 80.29% recall and 83.44% F-measure which are relatively higher than those values reported for other Persian NER methods.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

Named Entity Recognition in Bengali: A Conditional Random Field Approach

This paper reports about the development of a Named Entity Recognition (NER) system for Bengali using the statistical Conditional Random Fields (CRFs). The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the various named entity (NE) classes. A portion of the partially NE tagged Bengali news corpus, develope...

full text

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

full text

Named entity recognition without domain-knowledge using conditional random fields

This paper addresses the problem of not using any domain-knowledge in named entity recognition (NER) tasks. Experiments on two well-known datasets show that the currently mostly used technique – conditional random fields (CRF) – achieves results which are respectable. It is discussed if it is acceptable to pass on better results to get results in a faster and modular way. 1. Conditional Random ...

full text

Disease Named Entity Recognition Using Conditional Random Fields

Named Entity Recognition is a crucial component in bio-medical text mining.In this paper a method for disease Named Entity Recognition is proposed which utilizes sentence and token level features based on Conditional Random Field’s using NCBI disease corpus. The feature set used for the experiment includes orthographic,contextual,affixes,ngrams,part of speech tags and word normalization.Using t...

full text

Arabic Named Entity Recognition using Conditional Random Fields

The Named Entity Recognition (NER) task consists in determining and classifying proper names within an open-domain text. This Natural Language Processing task proved to be harder for languages with a complex morphology such as the Arabic language. NER was also proved to help Natural Language Processing tasks such as Machine Translation, Information Retrieval and Question Answering to obtain a h...

full text

PersoNER: Persian Named-Entity Recognition

Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present ...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 8  issue 2

pages  227- 236

publication date 2020-04-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023